This vignette will provide basic steps for interacting with RaMP-DB (Relational database of Metabolomic Pathways).
Details on RaMP-DB installation are also avaialble through GitHub (https://github.com/RAMP-project/RAMP). Questions can be asked through the Issues tab or by sending an email to NCATSRaMP@nih.gov.
RaMP-DB supports queries and enrichment analyses. Supported queries are:
Supported enrichment analyses are:
Once installed, first load the package. The first call is to list available database version within your local file cache and in our remote repository. Initialize RaMP database object. This method will reference a RaMP DB version in local file cache for your current session, or will download the latest version of the RaMP database. Note that this RaMP() method can accept a version argument with a format like, version=‘2.3.2’, for instance. The supplied version should be one of the versions shown after listing available versions.
library(RaMP)
library(DT) # for prettier tables in vignette
library(dplyr)
library(magrittr)
listAvailableRaMPDbVersions()## [1] "Locally available versions of RaMP SQLite DB, currently on your computer:"
## [1] "2.6.3" "2.5.4" "2.4.2" "2.4.1" "2.3.1"
## [1] "Available remote RaMP SQLite DB versions for download:"
## [1] "2.5.4" "2.5.0" "2.4.3" "2.4.2" "2.4.0" "2.3.2" "2.3.1"
## [1] "The following RaMP Database versions are available for download:"
## [1] "2.5.0" "2.4.3" "2.4.0" "2.3.2"
## [1] "Use the command db <- RaMP(<new_version_number>) to download the specified version."
# load a local RaMP database or download the latest RaMP database version from the repository.
# If the version is not specified, the latest local version will be used.
# If there are not local database cached, then the latest remote version will be downloaded.
rampDB <- RaMP(branch="ramp3.0")Note that it is always preferable to utilize IDs rather then common names. When entering IDs, prepend each ID with the database of origin followed by a colon, for example kegg:C02712, hmdb:HMDB04824, etc.. It is possible to input IDs using multiple different sources. RaMP currently supports the following ID types (that should be prepended):
Users are able to input external data sources of analytes using the function createRaMPInput(). Converts data.frame, .csv, or .xlsx formatted metabolite metadata into RaMP data input format. The input should have ID sources (e.g. hmdb, kegg, entrez) as column names and the corresponding rows filled with IDs from that source.
dir <- system.file("extdata", package="RaMP", mustWork=TRUE)
exInput <- file.path(dir, "ExampleRaMPInput.csv")
data2 <- createRaMPInput(filePath = exInput)
testids <- getPathwayFromAnalyte(analytes = data2, db=rampDB)
datatable(testids)
new.data <- distinct(testids, commonName, inputId)
print(new.data)Users can retrieve analytes from input pathways, retrieve pathways from input analytes, as well as perform pathway enrichment.
Analytes (genes, proteins, metabolites) can be retrieved by pathway. Users have to input the exact pathway name. Here is an example:
## [1] "fired!"
## [1] "Timing .."
## user system elapsed
## 0.593 0.286 1.802
To retrieve information from multiple pathways, input a vector of pathway names:
myanalytes <- getAnalyteFromPathway(pathway=c("Wnt Signaling Pathway",
"sphingolipid metabolism"), db=rampDB)## [1] "fired!"
## [1] "Timing .."
## user system elapsed
## 0.619 0.054 0.680
It is oftentimes useful to get a sense of what pathways are represented in a dataset (this is particularly true for metabolomics, where coverage of metabolites varies depending on what platform is used). In other cases, one may be interested in exploring one or several metabolites to see what pathways they are arepresented in.
In this example, we will search for pathways that involve the two genes MDM2 and TP53, and the two metabolites glutamate and creatine.
pathwaydfids <- getPathwayFromAnalyte(c("ensembl:ENSG00000135679", "hmdb:HMDB0000064","hmdb:HMDB0000148", "ensembl:ENSG00000141510"), db=rampDB)## [1] "Starting getPathwayFromAnalyte()"
## [1] "Working on ID List..."
## [1] "finished getPathwayFromAnalyte()"
## [1] "Found 328 associated pathways."
Note that each row returns a pathway attributed to one of the input analytes. To retrieve the number of unique pathways returned for all analytes or each analyte, try the following:
print(paste("Number of Unique Pathways Returned for All Analytes:",
length(unique(pathwaydfids$pathwayId))))## [1] "Number of Unique Pathways Returned for All Analytes: 285"
lapply(unique(pathwaydfids$commonName), function(x) {
(paste("Number of Unique Pathways Returned for",x,":",
length(unique(pathwaydfids[which(pathwaydfids$commonName==x),]$pathwayId))))})## [[1]]
## [1] "Number of Unique Pathways Returned for L-Glutamic acid,Glutamate : 126"
##
## [[2]]
## [1] "Number of Unique Pathways Returned for TP53 : 144"
##
## [[3]]
## [1] "Number of Unique Pathways Returned for Creatine : 8"
##
## [[4]]
## [1] "Number of Unique Pathways Returned for MDM2 : 50"
RaMP performs pathway and chemical class overrespresentation analysis using Fisher’s tests.
Using the pathways that our analytes map to, captured in the pathwaydfids data frame in the previous step, we can now run Fisher’s Exact test to identify pathways that are enriched for our analytes of interest:
test.inputs <- "kegg:C00780"
fisher.results <- runCombinedFisherTest(analytes = c(
"hmdb:HMDB0000033",
"hmdb:HMDB0000052",
"hmdb:HMDB0000094",
"hmdb:HMDB0000161",
"hmdb:HMDB0000168",
"hmdb:HMDB0000191",
"hmdb:HMDB0000201",
"chemspider:10026",
"hmdb:HMDB0006059",
"Chemspider:6405",
"CAS:5657-19-2",
"hmdb:HMDB0002511",
"chemspider:20171375",
"CAS:133-32-4",
"CAS:5746-90-7",
"CAS:477251-67-5",
"hmdb:HMDB0000695",
"chebi:15934",
"CAS:838-07-3",
"hmdb:HMDBP00789",
"hmdb:HMDBP00283",
"hmdb:HMDBP00284",
"hmdb:HMDBP00850"
), db=rampDB)Retrieve Pathways From Input: To explicitly view the results of mapping input IDs to RaMP, users can run the getPathwayFromAnalyte() function as noted in above in the section “Retrieve Pathways From Input Analyte(s)”.
Once we have our fisher results we can format them into a new dataframe and filter the pathways for significance. For this example we will be using an FDR p-value cutoff of 0.05.
#Returning Fisher Pathways and P-Values
filtered.fisher.results <- FilterFishersResults(fishers_df = fisher.results, pval_type = 'holm', pval_cutoff=0.05)## [1] "Filtering Fisher Results..."
## [1] "Fisher Result Type: Pathway Enrichment"
Because RaMP combines pathways from multiple sources, pathways may be represented more than once. Further, due to the hierarchical nature of pathways and because Fisher’s testing assumes pathways are independent, subpathways and their parent pathways may appear in a list. To help group together pathways that represent similar biological processes, we have implemented a clustering algorithm that groups pathways together if they share analytes in common.
clusters <- findCluster(filtered.fisher.results,
perc_analyte_overlap = 0.2,
min_pathway_tocluster = 2, perc_pathway_overlap = 0.2, db=rampDB)## [1] "Clustering pathways..."
## [1] "Finished clustering pathways..."
## print("Pathways with Holm-adjusted Pval < 0.05")
datatable(clusters$fishresults %>% mutate_if(is.numeric, ~ round(., 8)),
rownames = FALSE
)To view clustered pathway results:
pathwayResultsPlot(filtered.fisher.results, text_size = 8, perc_analyte_overlap = 0.2,
min_pathway_tocluster = 2, perc_pathway_overlap = 0.2, interactive = FALSE, db=rampDB)## [1] "Clustering pathways..."
## [1] "Finished clustering pathways..."
RaMP contains information on where the metabolites originate from the biospecimen. This information is called ontology.
The user can retrieve the metabolites that are associated with a specific ontology or vector of ontologies. Available ontologies include health condition, organ/components, tissue, biofluid, industrial applications and others.
The function getMetaFromOnto() retrieves metabolites that are associated with a certain ontology. It should be noted that it does not matter which ontology the metabolites are from. The function will return all metabolites associated with all the ontologies specified by the user.
ontologies.of.interest <- c("Colon", "Liver", "Lung")
new.metabolites <- getMetaFromOnto(ontology = ontologies.of.interest, db=rampDB)## [1] "Retreiving Metabolites for input ontology terms."
## [1] "Found 3 ontology term matches."
## [1] "Found 1580 metabolites associated with the input ontology terms."
## [1] "Finished getting metabolies from ontology terms."
To retrieve ontologies that are associated with our metabolites we can use getOntoFromMeta(). This function takes in a vector of metabolites as an input and returns a vector comprised of the ontologies from the user’s defined metabolites.
RaMP has several capabilities to analyze reactions that involve metabolites. These capabilities include: Retrieving analytes involved in the same reaction, obtain reaction classes, plot reaction classes, generate networks from the transcript data, as well as generate an interactive upset plot of overlapping input compounds at reaction class level 1.
The user may want to know what gene transcripts encode enzymes which can catalyze reactions involving metabolites in their experiment. RaMP can return this data to its user.
We can return the gene transcripts using the rampFastCata() function. To use it the user needs to provide a vector of metabolites they are interested in. Two reaction lists are returned, HMDB analyte associations, as well as Rhea analyte associations.
The user can also input protein IDs or gene transcripts in the vector to return the metabolites involved in chemical reactions with the input proteins or gene transcript encoded proteins.
#Input Metabolites and Proteins
inputs.of.interest <- c("kegg:C00186" , "hmdb:HMDB0000148", "kegg:C00780", "hmdb:HMDB0000064", "ensembl:ENSG00000115850", "uniprot:Q99259")
new.transcripts <- rampFastCata(analytes = inputs.of.interest, db=rampDB)## [1] "Analyte ID-based reaction partner query."
## [1] "Building metabolite to gene relations."
## [1] "Number of met2gene relations: 132"
## [1] "Building gene to metabolite relations."
## [1] "Total Relation Count: 144"
## [1] "There are no ChEBI metabolite IDs in the input. Skipping metabolite to protein query step."
RaMP can output reaction class and Enzyme Commission numbers (EC numbers) for a collection of input compound ids. The function getReactionClassesForAnalytes() will output this information for the user.
UpSEt plot will include non-enzymatic reactions - ###################################################
RaMP also a built in function which is able to generate an interactive plot from the reaction class data. This function is named plotReactionClasses(). This function uses the dataframe created by getReactionClassesForAnalytes() as an input. These plots are completely interactive.
analytes.of.interest = c('chebi:58115', 'chebi:456215', 'chebi:58245', 'chebi:58450',
'chebi:17596', 'chebi:16335', 'chebi:16750', 'chebi:172878',
'chebi:62286', 'chebi:77897', 'uniprot:P30566','uniprot:P30520',
'uniprot:P00568', 'uniprot:P23109', 'uniprot:P22102', 'uniprot:P15531')
reaction.classes <- getReactionClassesForAnalytes(analytes = analytes.of.interest, db=rampDB)## [1] "Starting reaction class query..."
## Reporting Function: getReactionClassesForAnalytes
## The input list has 16 IDs.
## The input list has 10 chebi IDs.
## The input list has 6 uniprot IDs.
## [1] "Passed the getReactionClassStats"
## [1] "humanProtein TRUE"
## [1] "Completed reaction class query..."
RaMP has a built in function which is able to generate networks from the transcript data. This function is named plotCataNetwork(). This function uses the dataframe created by rampFastCata() as an input. These plots are completely interactive.
This code section demonstrates a Rhea reaction query.
analytes.of.interest = c('chebi:58115', 'chebi:456215', 'chebi:58245', 'chebi:58450',
'chebi:17596', 'chebi:16335', 'chebi:16750', 'chebi:172878',
'chebi:62286', 'chebi:77897', 'uniprot:P30566','uniprot:P30520',
'uniprot:P00568', 'uniprot:P23109', 'uniprot:P22102', 'uniprot:P15531')
reactionsLists <- getReactionsForAnalytes(analytes = analytes.of.interest, includeTransportRxns = F, humanProtein = T, db=rampDB)## Running getReactionsForAnalytes()
## Reporting Function: getReactionsForAnalytes
## The input list has 16 IDs.
## The input list has 10 chebi IDs.
## The input list has 6 uniprot IDs.
## Finished getReactionsForAnalytes()
# just show the reactions with at least one metabolite and one protein in commmon.
datatable(subset(reactionsLists$metProteinCommonReactions))Three reaction lists are returned, metabolites-to-reactions, proteins-to-reactions, and reactions that have at least one metaboite and one protein from the input analyte list.
After recieving these reactions, the function plotAnalyteOverlapPerRxnLevel() will generate an interactive upset plot of overlapping input compounds at reaction class level 1.
Users can retrieve chemical classes and chemical property information from input metabolites, as well as perform chemical enrichment from input metabolites.
RaMP incorporates Classyfire and lipidMAPS classes. The function chemicalClassSurvey() function takes as input a vector of metabolites and outputs the classes associated with each metabolite input.
metabolites.of.interest = c("pubchem:64969", "chebi:16958", "chemspider:20549", "kegg:C05598", "chemspider:388809", "pubchem:53861142", "hmdb:HMDB0001138", "hmdb:HMDB0029412")
chemical.classes <- chemicalClassSurvey(mets = metabolites.of.interest, db=rampDB)## [1] "Starting Chemical Class Survey"
## [1] "...finished metabolite list query..."
## [1] "...finished DB population query..."
## [1] "...collating data..."
## [1] "...creating query efficiency summary..."
## [1] "Finished Chemical Class Survey"
Chemical properties captured by RaMP include SMILES, InChI, InChI-keys, monoisotopic masses, molecular formula, and common name. The getChemicalProperties() function takes as input a vector of metabolites and outputs a list of chemical property information that can easily be converted into a dataframe.
## Starting Chemical Property Query
## Finished Chemical Property Query
After retrieving chemical classes of metabolites, the function chemicalClassEnrichment() function will perform overrepresentation analysis using a Fisher’s test and output classes that show enrichment in the user input list of metabolites relative to the backgroud metabolite population (all meteabolites in RaMP). The function performs enrichment analysis for Classyfire classes, sub-classess, and super-classes, and for LipidMaps categories, main classess, and sub classes.
metabolites.of.interest = c("pubchem:64969", "chebi:16958", "chemspider:20549", "kegg:C05598", "chemspider:388809", "pubchem:53861142", "hmdb:HMDB0001138", "hmdb:HMDB0029412")
chemical.enrichment <- chemicalClassEnrichment(mets = metabolites.of.interest,db=rampDB)## [1] "Starting Chemical Class Enrichment"
## [1] "Starting Chemical Class Survey"
## [1] "...finished metabolite list query..."
## [1] "...finished DB population query..."
## [1] "...collating data..."
## [1] "...creating query efficiency summary..."
## [1] "Finished Chemical Class Survey"
## [1] "check total summary"
## [1] "getting population totals"
## [1] "Finished Chemical Class Enrichment"
## [1] "ClassyFire_class" "ClassyFire_sub_class" "ClassyFire_super_class"
## [4] "result_type"
# To retrieve results for the ClassyFire Class:
classy_fire_classes <- chemical.enrichment$ClassyFire_class
datatable(classy_fire_classes)Note: To explicitly view the results of mapping input IDs to RaMP, users can run the chemicalClassSurvey() function as noted in above in the section “Retrieve Chemical Class from Input Metabolites”.
Users are able to download previous versions of RaMP, and can input queries in these earlier versions. Some annotations have been added or changed since updated versions have been posted.
#Example query for earlier version
Alternate.db <- RaMP('2.3.1')
Alternate.Ramp <- getAnalyteFromPathway(db = Alternate.db, pathway = c('Pentose Phosphate Pathway'))
datatable(Alternate.Ramp)
#Example query for current version
Current.db <- RaMP('2.5.4')
Current.Ramp <- getAnalyteFromPathway(db = Current.db, pathway = c('Pentose Phosphate Pathway'))
datatable(Current.Ramp)## R version 4.3.2 (2023-10-31)
## Platform: x86_64-apple-darwin20 (64-bit)
## Running under: macOS Sonoma 14.6.1
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: America/New_York
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] magrittr_2.0.3 dplyr_1.1.4 DT_0.33 RaMP_3.0.2
##
## loaded via a namespace (and not attached):
## [1] gtable_0.3.5 xfun_0.47 bslib_0.8.0
## [4] ggplot2_3.5.1 visNetwork_2.1.2 htmlwidgets_1.6.4
## [7] lattice_0.21-9 vctrs_0.6.5 tools_4.3.2
## [10] crosstalk_1.2.1 generics_0.1.3 curl_5.2.2
## [13] Polychrome_1.5.1 tibble_3.2.1 fansi_1.0.6
## [16] RSQLite_2.3.7 highr_0.11 blob_1.2.4
## [19] janeaustenr_1.0.0 pkgconfig_2.0.3 tokenizers_0.3.0
## [22] Matrix_1.6-4 data.table_1.16.0 dbplyr_2.4.0
## [25] scatterplot3d_0.3-44 lifecycle_1.0.4 compiler_4.3.2
## [28] farver_2.1.2 munsell_0.5.1 htmltools_0.5.8.1
## [31] SnowballC_0.7.1 sass_0.4.9 yaml_2.3.10
## [34] lazyeval_0.2.2 tidytext_0.4.2 plotly_4.10.4
## [37] pillar_1.9.0 jquerylib_0.1.4 tidyr_1.3.1
## [40] upsetjs_1.11.1 cachem_1.1.0 tidyselect_1.2.1
## [43] digest_0.6.37 stringi_1.8.4 purrr_1.0.2
## [46] labeling_0.4.3 fastmap_1.2.0 grid_4.3.2
## [49] colorspace_2.1-1 cli_3.6.3 utf8_1.2.4
## [52] withr_3.0.1 filelock_1.0.2 scales_1.3.0
## [55] bit64_4.0.5 rmarkdown_2.28 httr_1.4.7
## [58] bit_4.0.5 memoise_2.0.1 evaluate_0.24.0
## [61] knitr_1.48 viridisLite_0.4.2 BiocFileCache_2.10.1
## [64] rlang_1.1.4 Rcpp_1.0.13 glue_1.7.0
## [67] DBI_1.2.3 rstudioapi_0.16.0 jsonlite_1.8.8
## [70] R6_2.5.1